A Route Confidence Evaluation Method for Reliable Hierarchical Text Categorization

نویسندگان

  • Nima Hatami
  • Camelia Chira
  • Giuliano Armano
چکیده

Hierarchical Text Categorization (HTC) is becoming increasingly important with the rapidly growing amount of text data available in the World Wide Web. Among the different strategies proposed to cope with HTC, the Local Classifier per Node (LCN) approach attains good performance by mirroring the underlying class hierarchy while enforcing a top-down strategy in the testing step. However, the problem of embedding hierarchical information (parent-child relationship) to improve the performance of HTC systems still remains open. A confidence evaluation method for a selected route in the hierarchy is proposed to evaluate the reliability of the final candidate labels in an HTC system. In order to take into account the information embedded in the hierarchy, weight factors are used to take into account the importance of each level. An acceptance/rejection strategy in the top-down decision making process is proposed, which improves the overall categorization accuracy by rejecting a few percentage of samples, i.e., those with low reliability score. Experimental results on the Reuters benchmark dataset (RCV1v2) confirm the effectiveness of the proposed method, compared to other state-of-the art HTC methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical text categorization using fuzzy relational thesaurus

Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present a new approach for the text categorization by means of Fuzzy Relational Thesaurus (FRT). FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category. The goal of our approach is twofold; to develop a reliable t...

متن کامل

Systematic Construction of Hierarchical Classifier in SVM-Based Text Categorization

In a text categorization task, classification on some hierarchy of classes shows better results than the case without the hierarchy. In current environments where large amount of documents are divided into several subgroups with a hierarchy between them, it is more natural and appropriate to use a hierarchical classification method. We introduce a new internal node evaluation scheme which is ve...

متن کامل

An effective procedure for constructing a hierarchical text classification system

In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with l...

متن کامل

Hierarchical Text Categorization and Its Application to Bioinformatics

In a hierarchical categorization problem, categories are partially ordered to form a hierarchy. In this dissertation, we explore two main aspects of hierarchical categorization: learning algorithms and performance evaluation. We introduce the notion of consistent hierarchical classification that makes classification results more comprehensible and easily interpretable for end-users. Among the p...

متن کامل

Kybernet Ika Volume Number Pages Hierarchical Text Categorization Using Fuzzy Relational Thesaurus

Text categorization is the classi cation to assign a text document to an appropriate category in a prede ned set of categories We present a new approach for the text cate gorization by means of Fuzzy Relational Thesaurus FRT FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category The goal of our approach is twofold to develop a reliable text cat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1206.0335  شماره 

صفحات  -

تاریخ انتشار 2012